Building trees interactively

The methodology described so far has assumed that trees are built automatically using the induction feature. According to this methodology, the only control you have over the development of the tree is through the forward pruning parameters.

For most users wishing to discover patterns of interest in data, the above methodology is satisfactory. However, for advanced users wishing to engineer knowledge out of data for decision making applications, MinedTree objects provide interactive induction and pruning. When building a tree you can use the Induce Single Node function to add a split. You will then be presented by a list of attributes available for splitting ranked by entropy or significance for discrete outcomes and normalised standard deviation for numeric outcomes. The automatic induction option would normally select the highest ranking attribute. However, with interactive rule induction (development), you can make the selection yourself by considering the ranking of attributes and the implication for your organisation of selecting the highest ranking attributes. This can involve selecting any attribute, changing the numeric threshold for a split, and changing the grouping of attribute values between the two branches. It is important to be constrained by the significance of the splits selected, as you now have the ability to control pruning interactively by stopping the growth of the tree when significance is low, or when branches contain too few data records. You can specify a Min examples in a branch parameter to filter out splits which fall below a given number of data records (rows). Attributes which fall below this will be appear in the list with 'Min' against them to indicate that they are below this threshold.

Interactive induction is made more effective by the facility to develop a number of trees from the same data. Each tree can be developed with different attribute splits at different levels and the various trees can be validated against the test data.

We strongly recommend that the interactive tree building exercise is carried out by an analyst who has a good understanding of the data, or a person who is a 'domain expert' who relates to the data, or preferably has both qualities.